# Changelog

> For the complete changelog, please refer to: [Midscene Releases](https://github.com/web-infra-dev/midscene/releases)

## v0.30 - 🎯 Cache management upgrade and mobile experience optimization

### 🎯 More flexible cache strategy

v0.30 improves the cache system, allowing you to control cache behavior based on actual needs:

- **Multiple cache modes available**: Supports read-only, write-only, and read-write strategies. For example, use read-only mode in CI environments to reuse cache, and use write-only mode in local development to update cache
- **Automatic cleanup of unused cache**: Agent can automatically clean up unused cache records when destroyed, preventing cache files from accumulating
- **Simplified unified configuration**: Cache configuration parameters for CLI and Agent are now unified, no need to remember different configurations

### 📊 Report management convenience

- **Support for merging multiple reports**: In addition to playwright scenarios, all scenarios now support merging multiple automation execution reports into a single file for centralized viewing and sharing of test results

### 📱 Mobile automation optimization

#### iOS platform improvements
- **Real device support improvement**: Removed simctl check restriction, making iOS real device automation smoother
- **Auto-adapt device display**: Implemented automatic device pixel ratio detection, ensuring accurate element positioning on different iOS devices

#### Android platform enhancements
- **Flexible screenshot optimization**: Added `screenshotResizeRatio` option, allowing you to customize screenshot size while ensuring visual recognition accuracy, reducing network transmission and storage overhead
- **Screen info cache control**: Use `alwaysRefreshScreenInfo` option to control whether to fetch screen information each time, allowing cache reuse in stable environments to improve performance
- **Direct ADB command execution**: AndroidAgent added `runAdbCommand` method for convenient execution of custom device control commands

#### Cross-platform consistency
- **ClearInput support on all platforms**: Solves the problem of AI being unable to accurately plan clear input operations across platforms

### 🔧 Feature enhancements

- **Failure classification**: CLI execution results can now distinguish between "skipped failures" and "actual failures", helping locate issue causes
- **aiInput append mode**: Added `append` option to append input while preserving existing content, suitable for editing scenarios
- **Chrome extension improvements**:
  - Popup mode preference saved to localStorage, remembering your choice on next open
  - Bridge mode supports auto-connect, reducing manual operations
  - Support for GPT-4o and non-visual language models

### 🛡️ Type safety improvements

- **Zod schema validation**: Introduced type checking for action parameters, detecting parameter errors during development to avoid runtime issues
- **Number type support**: Fixed `aiInput` support for number type values, making type handling more robust

### 🐞 Bug fixes

- Fixed potential issues caused by Playwright circular dependencies
- Fixed issue where `aiWaitFor` as the first statement could not generate reports
- Improved video recorder delay logic to ensure the last frame is captured
- Optimized report display logic to view both error information and element positioning information simultaneously
- Fixed issue where `cacheable` option in `aiAction` subtasks was not properly passed

### 📚 Community

- Awesome Midscene section added [midscene-java](./awesome-midscene.md) community project

## v0.29 - 📱 iOS platform support added

### 🚀 iOS platform support added
The biggest highlight of v0.29 is the official introduction of iOS platform support! Now you can connect and automate iOS devices through WebDriver, extending Midscene's powerful AI automation capabilities to the Apple ecosystem, details: [Support iOS automation](./blog-support-ios-automation).

### 🚅 Qwen3-VL model adaptation

We've adapted the latest Qwen `Qwen3-VL` model, giving developers faster and more accurate visual understanding capabilities. See [Choose an AI model](./choose-a-model).

### 🤖 AI core capability enhancement

- **UI-TARS Model Performance Optimization**: Optimized aiAction planning, improved dialogue history management, and provided better context awareness capabilities
- **AI Assertion and Action Optimization**: We updated the prompt for `aiAssert` and optimized the internal implementation of `aiAction`, making AI-driven assertions and action execution more precise and reliable

### 📊 Reporting and debugging experience optimization
- **URL Parameter Playback Control**: To improve debugging experience, you can now directly control the default behavior of report playback through URL parameters

### 📚 Documentation
- Updated documentation deployment cache strategy to ensure users can access the latest documentation content in time

## v0.28 - 📱 Build your own GUI automation agent by integrating with your own interface (preview feature)

### 🚀 Support for integration with any interface (preview feature)

v0.28 introduces the capability to integrate with your own interface. Define an interface controller class that conforms to the `AbstractInterface` definition, and you can get a fully-featured Midscene Agent.

The typical use case for this feature is to build a GUI automation Agent for your own interface, such as IoT devices, in-house applications, car displays, etc.!

Combined with the universal Playground architecture and SDK enhancement features, developers can conveniently debug custom devices.

For more information, please refer to [Integrate with Any Interface (Preview Feature)](./integrate-with-any-interface.mdx)

### 📱 Android platform optimization
- **Planning Cache Support**: Added planning cache functionality for Android platform, improving execution efficiency
- **Input Strategy Enhancement**: Optimized input clearing strategy based on IME settings, improving Android platform input experience
- **Scroll Calculation Improvement**: Optimized scroll endpoint calculation algorithm for Android platform

### 👆 Gesture operation extension
- **Double-Click Operation Support**: Added support for double-click actions
- **Long Press and Swipe Gestures**: Added support for long press and swipe gestures

### ⚙️ Core function enhancement
- **Agent Configuration Isolation**: Implemented model configuration isolation between different agents, avoiding configuration conflicts
- **Execution Option Extension**: Added useCache and replanningCycleLimit configuration options for Agent, providing more fine-grained control
- **YAML Script Support**: Support for running universal custom devices through YAML scripts, enhancing automation capabilities

### 🐞 Bug fixes
- Fixed Qwen model search region size issues
- Optimized deepThink parameter handling and rectangle size calculation
- Resolved issues related to Playwright double-click operations
- Improved TEXT action type processing logic

### 📚 Documentation and community
- Added custom interface documentation to help developers better extend functionality
- Added [Awesome Midscene](./awesome-midscene.md) section in README to showcase community projects

## v0.27 - 🧠 Core module refactoring, assertions and reports functionally enhanced

### ⚠️ Core module refactoring

Based on the introduction of [Rslib](https://github.com/web-infra-dev/rslib) in v0.26 to improve development experience and reduce contribution thresholds, v0.27 takes it a step further by refactoring the core modules on a large scale. This makes it extremely easy to extend new devices and add new AI operations, and we sincerely welcome community developers to contribute!

**Due to the wide scope of this refactoring, please feel free to report any issues you encounter after upgrading, and we will address them promptly.**

### 🌐 API enhancement

- **`aiAssert` Functionally Enhanced**
  - New `name` field allows naming different assertion tasks, making it easier to identify and parse in JSON output results
  - New `domIncluded` and `screenshotIncluded` options allow flexible control over whether to send DOM snapshots and page screenshots to AI

### 🤖 Chrome extension playground upgrade

- All Agent APIs can be directly debugged and run in the Playground! Interactive, extraction, and verification cover three major categories of methods, with visual operations and verification that boost your automation development efficiency! Come experience the truly versatile AI automation platform! 🚀

### 📊 Report function optimization
- **New Marking Layer Switch**: The report player has added a switch to hide the marking layer, allowing users to view the original page view without obstruction when playing back.

### 🐞 Bug fixes
- Fixed the problem that `aiWaitFor` sometimes caused the report to not be generated
- Reduced memory consumption of Playwright plugin


## v0.26 - 🚀 Toolchain fully integrated [Rslib](https://github.com/web-infra-dev/rslib), greatly improving development experience and reducing contribution threshold

### 🌐 Web integration optimization
- Support freezing page context([freezePageContext](./api.mdx#agentfreezepagecontext)/[unfreezePageContext](./api.mdx#agentunfreezepagecontext)), so that all subsequent operations reuse the same page snapshot, avoiding repeated page status acquisition
- Add all agent APIs to Playwright fixture, simplify test script writing, and solve the problem of not generating reports when using agentForPage

### 📱 Android automation enhancement
- New keyboard hiding strategy([keyboardDismissStrategy](./integrate-with-android.html#androiddevice-constructor)), allowing you to specify the way to automatically hide the keyboard

### 📊 Report function optimization
- Report content lazy parsing, solving the problem of report crash when the report is large
- Report player adds automatic zoom switch, making it easier to view the global perspective playback
- Support aiAssert / aiQuery tasks in report playback, to fully show the entire page change process
- Fix the problem that the sidebar status is not displayed as a failure icon when the assertion fails
- Fix the problem that the drop-down filter in the report cannot be switched

### 🚀 Build and engineering
- Build tool migration to [Rslib](https://github.com/web-infra-dev/rslib) library development tool, improving build efficiency and development experience
- Full repository source code jump, making it easier for developers to view source code
- MCP npm package product volume optimization, from 56M to 30M, greatly improving loading speed

### 🐞 Bug fixes
- CLI automatically opens headed mode when keepWindow is true
- Fix the implementation problem of getGlobalConfig, solve the problem of abnormal environment variable initialization
- Ensure that the mime-type in base64 encoding is correct
- Fix the return value type of aiAssert task

## v0.25 - 🚀 Support using images as AI prompt input

### 🎯 Core function enhancement
- New worker runtime support, support running in worker environment
- Support using images as AI prompt input, see [Prompting with images](./api.mdx#prompting-with-images)
- Image processing upgrade, using Photon & Sharp for efficient image cropping

### 🌐 Web integration optimization
- Get XPath by coordinates, improve cache reproducibility
- Cache file moves plan module to the front, improving readability
- Chrome Recorder supports exporting all events to markdown documents
- agent supports specifying HTML report name, see [reportFileName](./api.mdx)

### 📱 Android automation enhancement
- Long press gesture support
- Pull-to-refresh support

### 🐞 Bug fixes
- Use global config to handle environment variables, avoid issues caused by multiple packaging
- Manually construct error information when error object serialization fails
- Fix playwright report type dependency declaration order issue
- Fix MCP packaging issue

### 📚 Documentation AI-friendly
- [LLMs.txt](./llm-txt.mdx) is now available in both Chinese and English, making it easier for AI to understand
- Each document now has a copy-to-markdown button, making it easier to feed to AI

### 🤖 Other function enhancement
- Chrome Recorder supports aiScroll function
- Refactor aiAssert to be consistent with aiBoolean

## v0.24 - 🤖 MCP for Android automation

### 🚀 MCP for Android automation
- You can now use Midscene MCP to automate Android apps, just like you use it for web apps. Read more: [MCP for Android Automation](./mcp-android.mdx)

### 🌐 Optimization
- For Mac platform Puppeteer, a double input clearing mechanism has been added to ensure that the input box is cleared before input

### 🔧 Development experience
- Simplify the way to build `htmlElement.js` to avoid report template build issues caused by circular dependencies
- Optimize development workflow, just use `npm run dev` to enter midscene project development


## v0.23 - 📊 New report style and YAML script ability enhancement

### 🎨 Report system upgrade
#### New report style
- New report style design, providing clearer and more beautiful test result display
- Optimize report layout and visual effects, improve user reading experience
- Enhance report readability and information hierarchy structure

![](https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/new%20report.png)

### ⚙️ YAML script ability enhancement

#### Support multiple YAML files batch execution
- New config mode support, support configure Yaml file running order, browser reuse strategy, parallelism
- Support getting JSON format running results

![](https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/Tuji_20250722_161353.338.png)

### 🧪 Test coverage enhancement

#### Android test enhancement
- New Android platform related test cases, improve code quality and stability
- Improve test coverage, ensure the reliability of Android features

## v0.22- 🎬 Chrome extension recording function released

### 🌐 Web integration enhancement

#### 1️⃣ New recording function
- Chrome extension adds recording function, which can record user operations on the page and generate automation scripts
- Support recording click, input, scroll and other common operations, greatly reducing the threshold for writing automation scripts
- The recorded operations can be directly played back and debugged in the Playground

#### 2️⃣ Upgrade to IndexedDB for storage
- Chrome extension's Playground and Bridge now use IndexedDB for data storage
- Compared to the previous storage scheme, it provides larger storage capacity and better performance
- Support storing more complex data structures, laying the foundation for future feature extensions

#### 3️⃣ Customize replanning cycle limit
- Set the `MIDSCENE_REPLANNING_CYCLE_LIMIT` environment variable to customize the maximum number of re-planning cycles allowed when executing operations (aiAction).
- The default value is 10. When the AI needs to re-plan more than this limit, an error will be thrown and suggest splitting the task.
- Provide more flexible task execution control, adapting to different automation scenarios

```bash
export MIDSCENE_REPLANNING_CYCLE_LIMIT=10 # default value is 10
```

### 📱 Android interaction optimization

#### 1️⃣ New screenshot path generation
- Generate a unique file path for each screenshot to avoid file overwrite issues
- Improve stability in concurrent test scenarios

## v0.21 - 🎨 Chrome extension UI upgrade

### 🌐 Web integration enhancement

#### 1️⃣ New chat-style user interface

- New chat-style user interface design for better user experience

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/recording_2025-07-07_08-16-16.mp4" controls/>

#### 2️⃣ Flexible timeout configuration

- Supports overriding timeout settings from test fixture, providing more flexible timeout control
- Applicable scenarios: Different test cases require different timeout settings

#### 3️⃣ Unified Puppeteer and Playwright configuration

- New `waitForNavigationTimeout` and `waitForNetworkIdleTimeout` parameters for Playwright
- Unified timeout options configuration for Puppeteer and Playwright, providing consistent API experience, reducing learning costs

#### 4️⃣ New data export callback mechanism

- New `agent.onDumpUpdate` callback function, can get real-time notification when data is exported
- Refactored the post-task processing flow to ensure the correct execution of asynchronous operations
- Applicable scenarios: Monitoring or processing exported data

### 📱 Android interaction optimization

#### 1️⃣ Input experience improvement

- Changed click input to slide operation, improving interaction response and stability
- Reduced operation failures caused by inaccurate clicks

## v0.20 - Support for assigning XPath to locate elements

### 🌐 Web integration enhancement

#### 1️⃣ New `aiAsk` method

- Allows direct questioning of the AI model to obtain string-formatted answers for the current page.
- Applicable scenarios: Tasks requiring AI reasoning such as Q&A on page content and information extraction.
- Example:

```typescript
await agent.aiAsk('any question')
```

#### 2️⃣ Support for passing XPath to locate elements

- Location priority: Specified XPath > Cache > AI model location.
- Applicable scenarios: When the XPath of an element is known and the AI model location needs to be skipped.
- Example:

```typescript
await agent.aiTap('submit button', { xpath: '//button[@id="submit"]' })
```

### 📱 Android improvement

#### 1️⃣ Playground tasks can be cancelled

- Supports interrupting ongoing automation tasks to improve debugging efficiency.

#### 2️⃣ Enhanced `aiLocate` API

- Returns the Device Pixel Ratio, which is commonly used to calculate the real coordinates of elements.

### 📈 Report generation optimization

Improve report generation mechanism, from batch storage to single append, effectively reducing memory usage and avoiding memory overflow when the number of test cases is large.


## v0.19 - Support for getting complete execution process data

### New API for getting Midscene execution process data

Add the `_unstableLogContent` API to the agent. Get the execution process data of Midscene, including the time of each step, the AI tokens consumed, and the screenshot.

The report is generated based on this data, which means you can customize your own report using this data.

Read more: [API documentation](./api.mdx#agent_unstablelogcontent)

### CLI support for adjusting Midscene env variable priority

By default, `dotenv` does not override the global environment variables in the `.env` file. If you want to override, you can use the `--dotenv-override` option.

Read more: [Use YAML-based Automation Scripts](./automate-with-scripts-in-yaml.mdx#use-env-file-to-override-global-environment-variables)

### Reduce report file size

Reduce the size of the generated report by trimming redundant data, significantly reducing the report file size for complex pages. The typical report file size for complex pages has been reduced from 47.6M to 15.6M!

## v0.18 - Enhanced reporting features

🚀 Midscene has another update! It makes your testing and automation processes even more powerful:

### Custom node in report

* Add the `logScreenshot` API to the agent. Take a screenshot of the current page as a report node, and support setting the node title and description to make the automated testing process more intuitive. Applicable for capturing screenshots of key steps, error status capture, UI validation, etc.

![](/blog/logScreenshot-api.png)

* Example:

```javascript
test('login github', async ({ ai, aiAssert, aiInput, logScreenshot }) => {
  if (CACHE_TIME_OUT) {
    test.setTimeout(200 * 1000);
  }
  await ai('Click the "Sign in" button');
  await aiInput('quanru', 'username');
  await aiInput('123456', 'password');

  // log by your own
  await logScreenshot('Login page', {
    content: 'Username is quanru, password is 123456',
  });

  await ai('Click the "Sign in" button');
  await aiAssert('Login success');
});
```

### Support for downloading reports as videos

* Support direct video download from the report player, just by clicking the download button on the player interface.

![](/blog/export-video.png)

* Applicable scenarios: Share test results, archive reproduction steps, and demonstrate problem reproduction.

### More Android configurations exposed

* Optimize input interactions in Android apps and allow connecting to remote Android devices

  * `autoDismissKeyboard?: boolean` - Optional parameter. Whether to automatically dismiss the keyboard after entering text. The default value is true.

  * `androidAdbPath?: string` - Optional parameter. Used to specify the path of the adb executable file.

  * `remoteAdbHost?: string` - Optional parameter. Used to specify the remote adb host.

  * `remoteAdbPort?: number` - Optional parameter. Used to specify the remote adb port.

* Examples:

```typescript
await agent.aiInput('Search Box', 'Test Content', { autoDismissKeyboard: true })
```

```typescript
const agent = await agentFromAdbDevice('s4ey59', {
    autoDismissKeyboard: false, // Optional parameter. Whether to automatically dismiss the keyboard after entering text. The default value is true.
    androidAdbPath: '/usr/bin/adb', // Optional parameter. Used to specify the path of the adb executable file.
    remoteAdbHost: '192.168.10.1', // Optional parameter. Used to specify the remote adb host.
    remoteAdbPort: '5037' // Optional parameter. Used to specify the remote adb port.
})
```

Upgrade now to experience these powerful new features! 

* [Custom Report Node API documentation](/en/api.mdx#log-screenshot)

* [API documentation for more Android configuration items](/en/integrate-with-android.mdx#androiddevice-constructor)


## v0.17 - Let AI see the DOM of the page

### Data query API enhanced

To meet more automation and data extraction scenarios, the following APIs have been enhanced with the `options` parameter, supporting more flexible DOM information and screenshots:

- `agent.aiQuery(dataDemand, options)`
- `agent.aiBoolean(prompt, options)`
- `agent.aiNumber(prompt, options)`
- `agent.aiString(prompt, options)`

#### New `options` parameter

- `domIncluded`: Whether to pass the simplified DOM information to AI model, default is off. This is useful for extracting attributes that are not visible on the page, like image links.
- `screenshotIncluded`: Whether to pass the screenshot to AI model, default is on.

#### Code example

```typescript
// Extract all contact information (including hidden avatarUrl attributes)
const contactsData = await agent.aiQuery(
  "{name: string, id: number, company: string, department: string, avatarUrl: string}[], extract all contact information including hidden avatarUrl attributes",
  { domIncluded: true }
);

// Check if the id attribute of the first contact is 1
const isId1 = await agent.aiBoolean(
  "Is the first contact's id is 1?",
  { domIncluded: true }
);

// Get the ID of the first contact (hidden attribute)
const firstContactId = await agent.aiNumber("First contact's id?", { domIncluded: true });

// Get the avatar URL of the first contact (invisible attribute on the page)
const avatarUrl = await agent.aiString(
  "What is the Avatar URL of the first contact?",
  { domIncluded: true }
);
```

### New right-click ability

Have you ever encountered a scenario where you need to automate a right-click operation? Now, Midscene supports a new `agent.aiRightClick()` method!

#### Function

Perform a right-click operation on the specified element, suitable for scenarios where right-click events are customized on web pages. Please note that Midscene cannot interact with the browser's native context menu after right-click.

#### Parameter description

- `locate`: Describe the element you want to operate in natural language
- `options`: Optional, supports `deepThink` (AI fine-grained positioning) and `cacheable` (result caching)

#### Example

```typescript
// Right-click on a contact in the contacts application, triggering a custom context menu
await agent.aiRightClick("Alice Johnson");

// Then you can click on the options in the menu
await agent.aiTap("Copy Info"); // Copy contact information to the clipboard
```

### A complete example

In this report file, we show a complete example of using the new `aiRightClick` API and new query parameters to extract contact data including hidden attributes.

Report file: [puppeteer-2025-06-04_20-34-48-zyh4ry4e.html](https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/puppeteer-2025-06-04_20-34-48-zyh4ry4e.html)

The corresponding code can be found in our example repository: [puppeteer-demo/extract-data.ts](https://github.com/web-infra-dev/midscene-example/blob/main/puppeteer-demo/extract-data.ts)


### Refactor cache

Use xpath cache instead of coordinates, improve cache hit rate.

Refactor cache file format from json to yaml, improve readability.

## v0.16 - Support MCP

### Midscene MCP

🤖 Use Cursor / Trae to help write test cases. 
🕹️ Quickly implement browser operations akin to the Manus platform. 
🔧 Integrate Midscene capabilities swiftly into your platforms and tools.

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/ozpmyhn_lm_hymuPild/ljhwZthlaukjlkulzlp/midscene/en-midscene-mcp-Sauce-Demo.mp4" controls/>

Read more: [MCP](./web-mcp.mdx)

### Support structured API for agent

APIs: `aiBoolean`, `aiNumber`, `aiString`, `aiLocate`

Read more: [Use JavaScript to Optimize the AI Automation Code](./blog-programming-practice-using-structured-api.md)

## v0.15 - Android automation unlocked!

### Android automation unlocked!

🤖 AI Playground: natural‑language debugging
📱 Supports native, Lynx & WebView apps
🔁 Replayable runs
🛠️ YAML or JS SDK
⚡ Auto‑planning & Instant Actions APIs

Read more: [Android automation](./blog-support-android-automation.mdx)

### More features

* Allow custom midscene_run dir
* Enhance report filename generation with unique identifiers and support split mode
* Enhance timeout configurations and logging for network idle and navigation
* Adapt for gemini-2.5-pro

## v0.14 - Instant actions

"Instant Actions" introduces new atomic APIs, enhancing the accuracy of AI operations. 

Read more: [Instant Actions](./blog-introducing-instant-actions-and-deep-think.md)

## v0.13 - DeepThink mode

### Atomic AI interaction methods

* Supports aiTap, aiInput, aiHover, aiScroll, and aiKeyboardPress for precise AI actions.

### DeepThink mode

* Enhances click accuracy with deeper contextual understanding.

![](/blog/0.13.jpeg)

## v0.12 - Integrate Qwen 2.5 VL

### Integrate Qwen 2.5 VL's native capabilities

* Keeps output accuracy. 
* Supports more element interactions. 
* Cuts operating cost by over 80%.

## v0.11.0 - UI-TARS model caching

### **✨ UI-TARS model support caching**

* Enable caching by document 👉 ： [Enable Caching](./caching.mdx)

* Enable effect

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/antd-form-cache.mp4" controls/>

![](/blog/0.11.0.png)

### **✨ Optimize DOM tree extraction strategy**

* Optimize the information ability of the dom tree, accelerate the inference process of models like GPT 4o

![](/blog/0.11.0-2.png)


## v0.10.0 - UI-TARS model released

UI-TARS is a Native GUI agent model released by the **Seed** team. It is named after the [TARS robot](https://interstellarfilm.fandom.com/wiki/TARS) in the movie [Star Trek](https://en.wikipedia.org/wiki/Star_Trek), which has high intelligence and autonomous thinking capabilities. UI-TARS **takes images and human instructions as input information**, can correctly perceive the next action, and gradually approach the goal of human instructions, leading to the best performance in various benchmark tests of GUI automation tasks compared to open-source and closed-source commercial models.

![](/blog/0.10.0.png)

UI-TARS: Pioneering Automated GUI Interaction with Native Agents - Figure 1

![](/blog/0.10.0-2.png)

UI-TARS: Pioneering Automated GUI Interaction with Native - Figure 4

### **✨** Model advantage

UI-TARS has the following advantages in GUI tasks:

* **Target-driven**

* **Fast inference speed**

* **Native GUI agent model**

* **Private deployment without data security issues**


## v0.9.0 - Bridge mode released

With the Midscene browser extension, you can now use scripts to link with the desktop browser for automated operations!

We call it "Bridge Mode".

Compared to previous CI environment debugging, the advantages are:

1. You can reuse the desktop browser, especially Cookie, login state, and front-end interface state, and start automation without worrying about environment setup.

2. Support manual and script cooperation to improve the flexibility of automation tools.

3. Simple business regression, just run it locally with Bridge Mode.

![](/blog/0.9.0.png)

Documentation: [Use Chrome Extension to Experience Midscene](./bridge-mode-by-chrome-extension.mdx)


## v0.8.0 - Chrome extension

### **✨ New Chrome extension, run Midscene anywhere**

Through the Midscene browser extension, you can run Midscene on any page, without writing any code.

Experience it now 👉：[Use Chrome Extension to Experience Midscene](./quick-experience.mdx)

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/Midscene_extension.mov" controls/>



## v0.7.0 - Playground ability

### **✨ Playground ability, debug anytime**

Now you don't have to keep re-running scripts to debug prompts!

On the new test report page, you can debug the AI execution results at any time, including page operations, page information extraction, and page assertions.

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/midscene-playground.mov" controls/>


## v0.6.0 - Doubao model support

### **✨ Doubao model support**

* Support for calling Doubao models, reference the environment variables below to experience.

```bash
MIDSCENE_OPENAI_INIT_CONFIG_JSON='{"baseURL":"https://xxx.net/api/v3","apiKey":"xxx"}'
MIDSCENE_MODEL_NAME='ep-20240925111815-mpfz8'
MIDSCENE_MODEL_TEXT_ONLY='true'
```

Summarize the availability of Doubao models:

* Currently, Doubao only has pure text models, which means "seeing" is not available. In scenarios where pure text is used for reasoning, it performs well.

* If the use case requires combining UI analysis, it is completely unusable


Example:

✅ The price of a multi-meat grape (can be guessed from the order of the text on the interface)

✅ The language switch text button (can be guessed from the text content on the interface: Chinese, English text)

❌ The left-bottom play button (requires image understanding, failed)

### **✨ Support for GPT-4o structured output, cost reduction**

By using the gpt-4o-2024-08-06 model, Midscene now supports structured output (structured-output) features, ensuring enhanced stability and reduced costs by 40%+.

Midscene now supports hitting GPT-4o prompt caching features, and the cost of AI calls will continue to decrease as the company's GPT platform is deployed.

### **✨ Test report: support animation playback**

Now you can view the animation playback of each step in the test report, quickly debug your running script 

<video src="https://lf3-static.bytednsdoc.com/obj/eden-cn/nupipfups/Midscene/midscene-play-all.mp4" controls/>

### **✨ Speed up: merge plan and locate operations, response speed increased by 30%**

In the new version, we have merged the Plan and Locate operations in the prompt execution to a certain extent, which increases the response speed of AI by 30%.

> Before

![](/blog/0.6.0.png)

> after

![](/blog/0.6.0-2.png)

### **✨ Test report: the accuracy of different models**

* GPT 4o series models, 100% correct rate

* doubao-pro-4k pure text model, approaching usable state

![](/blog/0.6.0-3.png)

![](/blog/0.6.0-4.png)

### **🐞** Problem fix

* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀

> before

![](/blog/0.6.0-5.png)

> after

![](/blog/0.6.0-6.png)


## v0.5.0 - Support GPT-4o structured output

### **✨ New features**

* Support for gpt-4o-2024-08-06 model to provide 100% JSON format limit, reducing Midscene task planning hallucination behavior

![](/blog/0.5.0.png)

* Support for Playwright AI behavior real-time visualization, improve the efficiency of troubleshooting

![](/blog/0.5.0-2.png)

* Cache generalization, cache capabilities are no longer limited to playwright, pagepass, puppeteer can also use cache

```diff
- playwright test --config=playwright.config.ts
# Enable cache
+ MIDSCENE_CACHE=true playwright test --config=playwright.config.ts
```

* Support for azure openAI

* Support for AI to add, delete, and modify the existing input

### **🐞** Problem fix

* Optimize the page information extraction to avoid collecting obscured elements, improving success rate, speed, and AI call cost 🚀

* During the AI interaction process, unnecessary attribute fields were trimmed, reducing token consumption.

* Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events

* For pagepass, provide an optimization solution for the flickering behavior that occurs during the execution of Midscene

```javascript
// Currently, pagepass relies on a too low version of puppeteer, which may cause the interface to flicker and the cursor to be lost. The following solution can be used to solve this problem
const originScreenshot = puppeteerPage.screenshot;
puppeteerPage.screenshot = async (options) => {
  return await originScreenshot.call(puppeteerPage, {
    ...options,
    captureBeyondViewport: false
  });
};
```

## v0.4.0 - Support CLI usage

### **✨ New features**

* Support for Cli usage, reducing the usage threshold of Midscene

```bash
# Headed mode (visible browser) access baidu.com and search "weather"
npx @midscene/cli --headed --url https://www.baidu.com --action "input 'weather', press enter" --sleep 3000

# Visit GitHub status page and save the status to ./status.json
npx @midscene/cli --url https://www.githubstatus.com/ \
  --query-output status.json \
  --query '{serviceName: string, status: string}[], github page status, return service name'
```

* Support for AI to wait for a certain time to continue the subsequent task execution

* Playwright AI task report shows the overall time and aggregates AI tasks by test group

### **🐞** Problem fix

* Optimize the AI interaction process to reduce the likelihood of hallucination in KeyboardPress and Input events


## v0.3.0 - Support AI report HTML

### **✨ New features**

* Generate html format AI report, aggregate AI tasks by test group, facilitate test report distribution

### **🐞** Problem fix

* Fix the problem of AI report scrolling preview

## v0.2.0 - Control Puppeteer by natural language

### **✨ New features**

* Support for using natural language to control puppeteer to implement page automation 🗣️💻

* Provide AI cache capabilities for playwright framework, improve stability and execution efficiency

* AI report visualization, aggregate AI tasks by test group, facilitate test report distribution

* Support for AI to assert the page, let AI judge whether the page meets certain conditions

## v0.1.0 - Control Playwright by natural language

### **✨ New features**

* Support for using natural language to control puppeteer to implement page automation 🗣️💻

* Support for using natural language to extract page information 🔍🗂️

* AI report visualization, AI behavior, AI thinking visualization 🛠️👀

* Direct use of GPT-4o model, no training required 🤖🔧
